Skip to content

feat: Add circuit breaker for file sync failures#364

Merged
phernandez merged 1 commit intomainfrom
feature/sync-circuit-breaker
Oct 16, 2025
Merged

feat: Add circuit breaker for file sync failures#364
phernandez merged 1 commit intomainfrom
feature/sync-circuit-breaker

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

Implements a circuit breaker pattern to prevent infinite retry loops when files consistently fail to sync. After 3 consecutive failures, files are automatically skipped until they're modified.

Changes

Core Implementation

  • Circuit breaker in sync_service.py: Tracks file failures with checksums (SHA-256)
    • MAX_CONSECUTIVE_FAILURES = 3 threshold
    • FileFailureInfo dataclass tracks failure count, timestamp, error, and checksum
    • Automatic reset when file content changes (new checksum)
    • Clears failure tracking on successful sync

Schema Updates

  • SkippedFileResponse: New Pydantic model for API responses
    • path: File path relative to project root
    • reason: Error message from last failure
    • failure_count: Number of consecutive failures
    • first_failed: Timestamp of first failure
  • SyncReportResponse: Added skipped_files field

CLI Enhancement

  • status command: Displays skipped files in both modes
    • Non-verbose: Summary count (e.g., "⚠️ 3 files skipped due to repeated failures")
    • Verbose: Detailed list with paths, failure counts, and error messages

Testing

  • 5 comprehensive circuit breaker tests:
    • test_circuit_breaker_skips_after_three_failures
    • test_circuit_breaker_resets_on_file_change
    • test_circuit_breaker_clears_on_success
    • test_circuit_breaker_tracks_multiple_files
    • test_circuit_breaker_handles_checksum_computation_failure
  • All use textwrap.dedent() for readable test content
  • Tests verify proper entity title extraction from frontmatter

Design Decisions

  1. Checksum-based reset: Files are retried when content changes, detected via SHA-256 checksum
  2. Simple threshold: 3 failures is a balance between resilience and preventing infinite loops
  3. Existing infrastructure: Leverages SyncReport.skipped_files instead of creating new status mechanisms
  4. User visibility: Surfaces failures in CLI status command where users check sync state

Testing

just test  # All 34 sync tests pass
just typecheck  # Type checking passes

Example Output

Non-verbose mode:

main: Status
├── docs/ +2 new ~1 modified
└── ⚠️  2 files skipped due to repeated failures

Verbose mode:

main: Status
└── ⚠️  Skipped (Circuit Breaker)
    ├── problematic/file1.md (failures: 3, reason: Invalid frontmatter)
    └── errors/file2.md (failures: 5, reason: Cannot read file)

Closes #189

🤖 Generated with Claude Code

Implements a circuit breaker pattern to prevent infinite retry loops
when files consistently fail to sync. After 3 consecutive failures,
files are skipped until they're modified.

Changes:
- Add circuit breaker in sync_service.py with MAX_CONSECUTIVE_FAILURES = 3
- Track file failures with checksums to detect when files change
- Add SkippedFile dataclass to report skipped files with failure details
- Update SyncReportResponse schema to include skipped_files field
- Display skipped files in CLI status command (summary + verbose modes)
- Add comprehensive tests for all circuit breaker scenarios

The circuit breaker:
- Tracks failures per file with SHA-256 checksums
- Resets automatically when file content changes
- Clears failure tracking on successful sync
- Prevents blocking sync operations on problematic files
- Logs warnings for skipped files

Fixes #189

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 16, 2025

Claude finished @phernandez's task —— View job


Code Review Complete ✓

I've completed a comprehensive review of PR #364 against Basic Memory's team standards. Overall, this is a well-implemented circuit breaker pattern with excellent test coverage and thoughtful design decisions.


✅ Strengths

Code Quality & Architecture

  • Clean implementation: Circuit breaker logic is well-encapsulated in the SyncService class
  • Type annotations: Full Python 3.12+ type annotations throughout (src/basic_memory/sync/sync_service.py:33-50)
  • Dataclass usage: Proper use of @dataclass for FileFailureInfo and SkippedFile
  • Atomic operations: Circuit breaker fits Basic Memory's atomic, composable design principles
  • Local-first preserved: Skipped files don't corrupt the knowledge graph or database

Testing Excellence

  • Comprehensive coverage: 5 dedicated circuit breaker tests cover all scenarios
  • Uses textwrap.dedent(): All test markdown uses proper formatting per standards (tests/sync/test_sync_service.py:1388-1397)
  • Edge cases covered: File changes, checksum failures, multiple files tracked independently
  • Proper entity title extraction: Tests verify frontmatter parsing works correctly

Documentation & UX

  • Clear error messages: Failure reasons are preserved and displayed to users
  • CLI integration: Both verbose and non-verbose modes implemented well
  • Logging: Appropriate log levels (info/warning/error) throughout
  • User visibility: Skipped files surface in status command where users naturally check

🔍 Code Quality Review

Line Length ✓

All lines respect the 100-character limit

Coding Conventions ✓

  • snake_case for functions/variables
  • PascalCase for classes (FileFailureInfo, SkippedFile, SkippedFileResponse)
  • Import order follows standards (standard lib → third-party → local)

SQLAlchemy & FastAPI ✓

  • No database schema changes (no migration needed)
  • Pydantic v2 schemas properly implemented with model_config
  • from_attributes=True correctly used for ORM conversion

Async Patterns ✓

  • Checksum computation properly uses thread pool (_compute_checksum_async)
  • Circuit breaker checks are async-aware
  • No blocking I/O in event loop

🎯 Review Checklist

Code Quality & Standards

  • Follows Basic Memory's coding conventions in CLAUDE.md
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI and Typer conventions followed
  • 100-character line length limit maintained
  • No commented-out code blocks

Testing & Documentation

  • Unit tests for new functions/methods (5 comprehensive tests)
  • Integration tests for edge cases
  • Test coverage for circuit breaker logic
  • Documentation updated (clear docstrings throughout)
  • CLAUDE.md conventions followed

Basic Memory Architecture

  • Follows atomic, composable design principles
  • No Alembic migrations needed (no schema changes)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • Markdown file handling preserves integrity
  • AI-human collaboration patterns followed

Security & Performance

  • No hardcoded secrets or credentials
  • Input validation via checksums
  • Proper error handling and logging
  • Performance: Uses existing checksum computation (no overhead)
  • No sensitive data in logs (only paths and error messages)

💡 Suggestions for Consideration

1. Circuit Breaker Threshold Configuration

Location: src/basic_memory/sync/sync_service.py:30

The MAX_CONSECUTIVE_FAILURES = 3 constant is currently hardcoded. Consider making this configurable:

# In BasicMemoryConfig
sync_max_consecutive_failures: int = 3

# In SyncService.__init__
self._max_failures = app_config.sync_max_consecutive_failures

Rationale: Power users may want to adjust this threshold based on their workflow or for debugging.

2. Skipped Files Persistence

Currently, circuit breaker state is in-memory only. If the sync process restarts, failure tracking is lost. This is probably fine for most use cases, but consider documenting this behavior or exploring persistent storage for high-reliability scenarios.

3. Minor: Import Order

Location: src/basic_memory/importers/claude_conversations_importer.py:11

The import of clean_filename, format_timestamp appears unchanged but could be checked for alphabetical ordering within the local imports section (very minor).

4. Test Marker for Circuit Breaker Tests

Consider adding a custom pytest marker @pytest.mark.circuit_breaker to make it easy to run just these tests:

pytest -m circuit_breaker

🎨 Design Decision Validation

✅ Checksum-based Reset

Excellent choice. Using SHA-256 checksums to detect file changes is:

  • Reliable (detects any content modification)
  • Efficient (checksums already computed during sync)
  • Deterministic (same content = same checksum)

✅ 3-Failure Threshold

Good balance between:

  • Resilience: Transient errors (network glitches, temporary locks) don't trigger circuit breaker
  • Protection: Prevents infinite loops from persistent errors

✅ Leveraging Existing Infrastructure

Smart to use SyncReport.skipped_files rather than creating new mechanisms. This keeps the API surface clean.

✅ CLI Status Integration

Perfect placement. Users naturally check basic-memory status when troubleshooting sync issues.


🔧 Code-Specific Observations

Excellent Error Handling Pattern

src/basic_memory/sync/sync_service.py:516-523:

except Exception as e:
    error_msg = str(e)
    logger.error(f"Failed to sync file: path={path}, error={error_msg}")
    await self._record_failure(path, error_msg)
    return None, None

This gracefully handles all exceptions and records them for circuit breaker tracking.

Clean Separation of Concerns

src/basic_memory/sync/sync_service.py:171-209:

  • _should_skip_file(): Decides if file should be skipped
  • _record_failure(): Tracks failures
  • _clear_failure(): Clears on success

Each method has a single, clear responsibility.

Robust Checksum Failure Handling

src/basic_memory/sync/sync_service.py:221-225:

try:
    checksum = await self._compute_checksum_async(path)
except Exception:
    checksum = ""  # Better than crashing

Graceful degradation prevents circuit breaker from blocking on checksum errors.


📊 Test Coverage Analysis

The PR adds 249 lines of test code for ~200 lines of production code (>100% coverage ratio). Test scenarios cover:

  1. Normal flow: Skip after 3 failures (test_circuit_breaker_skips_after_three_failures)
  2. Reset on change: File modification resets counter (test_circuit_breaker_resets_on_file_change)
  3. Success clears: Successful sync clears history (test_circuit_breaker_clears_on_success)
  4. Multiple files: Independent tracking (test_circuit_breaker_tracks_multiple_files)
  5. Edge case: Checksum computation failure (test_circuit_breaker_handles_checksum_computation_failure)

🚦 Final Assessment

Overall Quality: ⭐⭐⭐⭐⭐ Excellent

This PR demonstrates:

  • Professional software engineering: Clear design, clean code, comprehensive tests
  • Deep codebase understanding: Integrates seamlessly with existing architecture
  • User empathy: Solves a real problem (infinite retry loops) while maintaining visibility
  • Maintainability: Well-documented, easy to understand and extend

Recommendation: ✅ APPROVE

This implementation is production-ready and follows all Basic Memory standards. The circuit breaker pattern is a valuable addition that improves reliability without compromising the local-first architecture.


Note on Test Execution: I was unable to run just test or just lint due to permission restrictions in this environment. The author states "All 34 sync tests pass" and "Type checking passes" in the PR description. Based on code review, I see no issues that would prevent these from passing.


|

@phernandez phernandez merged commit 434cdf2 into main Oct 16, 2025
16 checks passed
@phernandez phernandez deleted the feature/sync-circuit-breaker branch October 16, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant